{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://raw.githubusercontent.com/Arize-ai/phoenix-assets/9e6101d95936f4bd4d390efc9ce646dc6937fb2d/images/socal/github-large-banner-phoenix.jpg\" width=\"1000\"/>\n",
    "        <br>\n",
    "        <br>\n",
    "        <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
    "    </p>\n",
    "</center>\n",
    "<h1 align=\"center\">Instrumenting a chatbot with human feedback</h1>\n",
    "\n",
    "Phoenix provides endpoints to associate user-provided feedback directly with OpenInference spans as annotations.\n",
    "\n",
    "In this tutorial, we will create a manually-instrument chatbot with user-triggered \"👍\" and \"👎\" feedback buttons. We will have those buttons trigger a callback that sends the user feedback to Phoenix and is viewable alongside the span. Automating associating feedback with spans is a powerful way to quickly focus on traces of your application that are not behaving as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q arize-phoenix-otel \"arize-phoenix-client>=1.5.0\" gradio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "from typing import Any, Dict\n",
    "from uuid import uuid4\n",
    "\n",
    "import httpx\n",
    "\n",
    "from phoenix.client import Client\n",
    "from phoenix.otel import register"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
    "\n",
    "if not (phoenix_api_key := os.getenv(\"PHOENIX_API_KEY\")):\n",
    "    phoenix_api_key = getpass(\"🔑 Enter your Phoenix API key: \")\n",
    "\n",
    "os.environ[\"PHOENIX_CLIENT_HEADERS\"] = f\"api_key={phoenix_api_key}\"\n",
    "os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = \"https://app.phoenix.arize.com\"\n",
    "os.environ[\"PHOENIX_PROJECT_NAME\"] = \"Chatbot with Annotations\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define endpoints and configure OpenTelemetry tracing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tracer_provider = register()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "FEEDBACK_ENDPOINT = f\"{os.environ['PHOENIX_COLLECTOR_ENDPOINT']}/span_annotations\"\n",
    "OPENAI_API_URL = \"https://api.openai.com/v1/chat/completions\"\n",
    "tracer = tracer_provider.get_tracer(__name__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define and instrument chat service backend\n",
    "\n",
    "Here we define two functions:\n",
    "\n",
    "`generate_response` is a function that contains the chatbot logic for responding to a user query. `generate_response` is manually instrumented using the `OpenInference` semantic conventions. More information on how to manually instrument an application can be found [here](https://arize.com/docs/phoenix/tracing/how-to-tracing/manual-instrumentation). `generate_response` also returns the OpenTelemetry spanID, a hex-encoded string that is used to associate feedback with a specific trace.\n",
    "\n",
    "`send_feedback` is a function that sends user feedback to Phoenix via the `span_annotations` REST route."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = Client()\n",
    "http_client = httpx.Client()\n",
    "\n",
    "\n",
    "def generate_response(\n",
    "    input_text: str, model: str = \"gpt-3.5-turbo\", temperature: float = 0.1\n",
    ") -> Dict[str, Any]:\n",
    "    user_message = {\"role\": \"user\", \"content\": input_text, \"uuid\": str(uuid4())}\n",
    "    invocation_parameters = {\"temperature\": temperature}\n",
    "    payload = {\n",
    "        \"model\": model,\n",
    "        **invocation_parameters,\n",
    "        \"messages\": [user_message],\n",
    "    }\n",
    "    headers = {\n",
    "        \"Content-Type\": \"application/json\",\n",
    "        \"Authorization\": f\"Bearer {openai_api_key}\",\n",
    "    }\n",
    "    with tracer.start_as_current_span(\"llm_span\", openinference_span_kind=\"llm\") as span:\n",
    "        span.set_input(user_message)\n",
    "\n",
    "        # get the active hex-encoded spanID\n",
    "        span_id = span.get_span_context().span_id.to_bytes(8, \"big\").hex()\n",
    "        print(span_id)\n",
    "\n",
    "        response = http_client.post(OPENAI_API_URL, headers=headers, json=payload)\n",
    "\n",
    "        if not (200 <= response.status_code < 300):\n",
    "            raise Exception(f\"Failed to call OpenAI API: {response.text}\")\n",
    "        response_json = response.json()\n",
    "\n",
    "        span.set_output(response_json)\n",
    "\n",
    "        return response_json, span_id\n",
    "\n",
    "\n",
    "def send_feedback(span_id: str, feedback: int, user_id: str) -> None:\n",
    "    label = \"👍\" if feedback == 1 else \"👎\"\n",
    "    client.annotations.add_span_annotation(\n",
    "        span_id=span_id,\n",
    "        annotation_name=\"user_feedback\",\n",
    "        label=label,\n",
    "        score=feedback,\n",
    "        metadata={\"example_key\": \"123\"},\n",
    "        identifier=user_id,\n",
    "    )\n",
    "    print(f\"Feedback sent for span_id {span_id}: {label}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define an LLM evaluator to run on incorrect responses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def run_llm_eval(span_id: str, input_text: str, assistant_content: str):\n",
    "    \"\"\"\n",
    "    Evaluates the quality of an LLM response by asking another LLM to classify its correctness.\n",
    "\n",
    "    Args:\n",
    "        span_id: The ID of the span to evaluate\n",
    "        input_text: The original unchanged user query\n",
    "        assistant_content: The assistant's response to evaluate\n",
    "    \"\"\"\n",
    "    # Create a prompt for the evaluation model\n",
    "    eval_prompt = f\"\"\"\n",
    "    You are an expert evaluator of AI assistant responses. Please evaluate the following:\n",
    "\n",
    "    User Query: {input_text}\n",
    "\n",
    "    Assistant Response: {assistant_content}\n",
    "\n",
    "    Is this response correct, helpful, and appropriate for the user query?\n",
    "    Provide a brief analysis and then classify as either \"CORRECT\" or \"INCORRECT\".\n",
    "\n",
    "    Format your response as follows:\n",
    "    Analysis: [Your analysis here]\n",
    "    Classification: [CORRECT or INCORRECT]\n",
    "    \"\"\"\n",
    "\n",
    "    # Call the evaluation model using the OpenAI API\n",
    "\n",
    "    headers = {\n",
    "        \"Content-Type\": \"application/json\",\n",
    "        \"Authorization\": f\"Bearer {openai_api_key}\",\n",
    "    }\n",
    "\n",
    "    payload = {\n",
    "        \"model\": \"gpt-4o\",  # Using a smaller model for evaluation\n",
    "        \"messages\": [{\"role\": \"user\", \"content\": eval_prompt}],\n",
    "    }\n",
    "\n",
    "    # Increased timeout to prevent ReadTimeout errors\n",
    "    eval_response = http_client.post(OPENAI_API_URL, headers=headers, json=payload, timeout=60.0)\n",
    "    eval_response = eval_response.json()\n",
    "    print(eval_response)\n",
    "    eval_content = eval_response[\"choices\"][0][\"message\"][\"content\"]\n",
    "\n",
    "    # Store the evaluation as an annotation\n",
    "    client.annotations.add_span_annotation(\n",
    "        span_id=span_id,\n",
    "        annotation_name=\"correctness\",\n",
    "        annotator_kind=\"LLM\",\n",
    "        label=\"INCORRECT\" if \"Classification: INCORRECT\" in eval_content else \"CORRECT\",\n",
    "        score=1 if \"Classification: INCORRECT\" in eval_content else 0,\n",
    "        explanation=eval_content,\n",
    "    )\n",
    "\n",
    "    print(f\"LLM Evaluation for span_id {span_id}:\")\n",
    "    print(eval_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Chat Widget\n",
    "\n",
    "We create a simple chat application using IPython widgets. Alongside the chatbot responses we provide feedback buttons that a user can click to provide feedback. These can be seen inside the Phoenix UI!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_gradio_chat():\n",
    "    import gradio as gr\n",
    "\n",
    "    def chat_response(message, history, user_id):\n",
    "        # Send the message to the OpenAI API and get the response\n",
    "        response_data, span_id = generate_response(message)\n",
    "        assistant_content = response_data[\"choices\"][0][\"message\"][\"content\"]\n",
    "\n",
    "        # Store the span_id for feedback\n",
    "        return assistant_content, span_id\n",
    "\n",
    "    def submit_feedback(feedback_type, span_id, message, response, user_id):\n",
    "        if feedback_type == \"positive\":\n",
    "            send_feedback(span_id, 1, user_id)\n",
    "            return \"Thanks for your positive feedback! We'll use it to improve our assistant.\"\n",
    "        else:  # negative feedback\n",
    "            send_feedback(span_id, 0, user_id)\n",
    "            run_llm_eval(span_id, message, response)\n",
    "            return \"Thanks for your feedback. We'll work on improving this type of response.\"\n",
    "\n",
    "    with gr.Blocks() as demo:\n",
    "        gr.HTML(\"<h3>Encyclopedia Chatbot</h3>\")\n",
    "        gr.HTML(\n",
    "            \"<p>Welcome to the Encyclopedia Chatbot. Ask any question about the world, and provide feedback to help us improve!</p>\"\n",
    "        )\n",
    "\n",
    "        user_id = gr.Dropdown(\n",
    "            choices=[\"user1\", \"user2\", \"user3\", \"user4\", \"user5\"], value=\"user1\", label=\"User ID\"\n",
    "        )\n",
    "\n",
    "        chatbot = gr.Chatbot(height=400)\n",
    "        msg = gr.Textbox(placeholder=\"Type your message here...\")\n",
    "\n",
    "        # Hidden state to store the current span_id\n",
    "        current_span_id = gr.State(\"\")\n",
    "        feedback_message = gr.Markdown(\"\")\n",
    "\n",
    "        def respond(message, chat_history, user_id):\n",
    "            # Get bot response\n",
    "            bot_response, span_id = chat_response(message, chat_history, user_id)\n",
    "\n",
    "            # Update chat history\n",
    "            chat_history.append((message, bot_response))\n",
    "\n",
    "            return \"\", chat_history, span_id\n",
    "\n",
    "        # Send button\n",
    "        msg.submit(respond, [msg, chatbot, user_id], [msg, chatbot, current_span_id])\n",
    "\n",
    "        with gr.Row():\n",
    "            thumbs_up = gr.Button(\"👍\", scale=1)\n",
    "            thumbs_down = gr.Button(\"👎\", scale=1)\n",
    "\n",
    "        # Feedback handlers\n",
    "        def handle_positive_feedback(span_id, chat_history, user_id):\n",
    "            if not chat_history:\n",
    "                return \"No message to provide feedback on.\"\n",
    "\n",
    "            last_user_msg, last_bot_msg = chat_history[-1]\n",
    "            return submit_feedback(\"positive\", span_id, last_user_msg, last_bot_msg, user_id)\n",
    "\n",
    "        def handle_negative_feedback(span_id, chat_history, user_id):\n",
    "            if not chat_history:\n",
    "                return \"No message to provide feedback on.\"\n",
    "\n",
    "            last_user_msg, last_bot_msg = chat_history[-1]\n",
    "            return submit_feedback(\"negative\", span_id, last_user_msg, last_bot_msg, user_id)\n",
    "\n",
    "        thumbs_up.click(\n",
    "            handle_positive_feedback, [current_span_id, chatbot, user_id], feedback_message\n",
    "        )\n",
    "\n",
    "        thumbs_down.click(\n",
    "            handle_negative_feedback, [current_span_id, chatbot, user_id], feedback_message\n",
    "        )\n",
    "\n",
    "    return demo\n",
    "\n",
    "\n",
    "# Create and display the Gradio interface\n",
    "demo = create_gradio_chat()\n",
    "demo.launch(inline=True, share=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyze feedback using the Phoenix Client\n",
    "\n",
    "We can use the Phoenix client to pull the annotated spans. By combining `get_spans_dataframe`\n",
    "and `get_span_annotations_dataframe` we can create a dataframe of all annotations alongside\n",
    "span data for analysis!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spans_df = client.spans.get_spans_dataframe(project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"])\n",
    "annotations_df = client.spans.get_span_annotations_dataframe(\n",
    "    spans_dataframe=spans_df, project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "annotations_df.join(spans_df, how=\"inner\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client.spans.get_span_annotations(\n",
    "    span_ids=spans_df.index, project_identifier=os.environ[\"PHOENIX_PROJECT_NAME\"]\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
